On the Shortest Common Superstring of NGS Reads
نویسندگان
چکیده
The Shortest Superstring Problem (SSP) consists, for a set of strings S = {s1, · · · , sn}, to find a minimum length string that contains all si, 1 ≤ i ≤ k, as substrings. This problem is proved to be NP-Complete and APX-hard. Guaranteed approximation algorithms have been proposed, the current best ratio being 2 11 23 , which has been achieved following a long and difficult quest. However, SSP is highly used in practice on next generation sequencing (NGS) data, which plays an increasingly important role in sequencing. In this note, we show that the SSP approximation ratio can be improved on NGS reads by assuming specific characteristics of NGS data that are experimentally verified on a very large sampling set.
منابع مشابه
انتخاب کوچکترین ابر رشته در DNA با استفاده از الگوریتم ازدحام ذرّات
A DNA string can be supposed a very long string on alphabet with 4 letters. Numerous scientists attempt in decoding of this string. since this string is very long , a shorter section of it that have overlapping on each other will be decoded .There is no information for the right position of these sections on main DNA string. It seems that the shortest string (substring of the main DNA string) i...
متن کاملOn Reoptimization of the Shortest Common Superstring Problem
In general, a reoptimization gives us a possibility to obtain a solution for a larger instance from a solution for a smaller instance. In this paper, we consider a possibility of usage of a reoptimization to solve the shortest common superstring problem.
متن کاملThe Shortest Common Superstring Problem
We consider the problem of the shortest common superstring. We describe an approach to solve the problem. This approach is based on an explicit reduction from the problem to the satisfiability problem.
متن کاملApproximating the Shortest Superstring Problem Using de Bruijn Graphs
The best known approximation ratio for the shortest superstring problem is 2 11 23 (Mucha, 2012). In this note, we improve this bound for the case when the length of all input strings is equal to r, for r ≤ 7. For example, for strings of length 3 we get a 1 1 3 -approximation. An advantage of the algorithm is that it is extremely simple both to implement and to analyze. Another advantage is tha...
متن کاملApproximating Shortest Superstring Problem Using de Bruijn Graphs
The best known approximation ratio for the shortest superstring problem is 2 11 23 (Mucha, 2012). In this note, we improve this bound for the case when the length of all input strings is equal to r, for r ≤ 7. E.g., for strings of length 3 we get a 1 1 3 -approximation. An advantage of the algorithm is that it is extremely simple both to implement and to analyze. Another advantage is that it is...
متن کامل